Unstandardized Accounting Terminology
  • Home
  • Dictionaries
  • Textual Measures
  • Robustness Tests
  • TBD …

On this page

  • Overview
    • Construction Approaches
  • Term Lists
  • Concept Lists
  • t-SNE Visualizations

Dictionaries


Overview

We construct alternative accounting dictionaries to capture the universe of accounting terminology and measure standardization in financial reporting. Each dictionary consists of two components:

  • Term Lists: Unique accounting terms that appear in financial reports
  • Concept Lists: Granular mappings showing which terms are used to describe the same underlying accounting concepts (i.e., synonyms)

Construction Approaches

Top-Down (Authoritative Sources): We collect terms from IFRS, US GAAP, and UK GAAP standards, plus specialized accounting dictionaries and the EU’s IATE database. Terms explicitly classified as synonyms are grouped by their underlying concepts. All lists are refined using GPT-based validation and manual checks, then restricted to terminology actually observed in our global corpus of financial reports.

Bottom-Up (XBRL Filings): We extract terms directly from financial statements by parsing XBRL filings on EDGAR. Specifically, we use Exhibit 101.LAB files, which map XBRL taxonomy tags to the natural language labels firms actually use in their reports. This captures real-world variation in reporting practice. To reduce noise, we require that 10-K terms appear in at least 20 distinct filings and 20-F terms in at least 5 distinct filings. We apply a majority disambiguation rule, removing terms that appear in less than 5% of filings for a given concept.

Term Lists

Term lists provide the complete set of unique accounting terms found in financial reports. These are useful for text analysis, dictionary-based approaches, and understanding the breadth of accounting vocabulary.

Download all term lists: 📥 Excel File (2.6 MB)

  • Top-Down
  • Bottom-Up (10-K)
  • Bottom-Up (20-F)

Source: IFRS, US GAAP, UK GAAP standards, and specialized accounting dictionaries

Source: ~50,000 U.S. 10-K XBRL filings (2009-2025)

Source: 20-F XBRL filings from non-U.S. firms using IFRS Taxonomy (2009-2025)


Concept Lists

Concept lists map individual terms to accounting concepts, revealing which terms are used interchangeably (i.e., as synonyms) to describe the same accounting concept. Each row represents a unique term-concept pairing.

Download all concept lists: 📥 Excel File

  • Top-Down
  • Bottom-Up (10-K)
  • Bottom-Up (20-F)

Construction: Terms from dictionaries and standards explicitly classified as synonyms are grouped into concepts. Concepts are validated using graph theory (complete graph property) and GPT-based checks to ensure all terms within a concept are truly interchangeable.

Structure: Each row shows a term (TID) and its associated concept (CID), along with the n-gram count.

Construction: Terms are grouped by XBRL taxonomy tags, where each tag represents a distinct accounting concept. Terms linked to multiple tags are assigned to their primary concept using a majority rule (5% threshold).

Structure: Each row shows which term (TID) maps to which XBRL concept (CID). This reveals how U.S. domestic filers describe the same accounting items using different terminology.

Construction: Same methodology as 10-K, but using IFRS Taxonomy tags from 20-F filings. Captures how international filers describe accounting concepts.

Structure: Each row shows term-concept mappings based on IFRS Taxonomy, revealing cross-border variation in financial reporting language.


t-SNE Visualizations

Explore the semantic structure of individual concepts using t-SNE projections. Each plot shows how terms within a concept cluster together in two-dimensional embedding space.

t-SNE plot

Select dictionary, model, and concept to view plot


 

© 2025 | Supplementary materials for JAE submission